Combining Local Feature Scoring Methods for Text Categorization
نویسندگان
چکیده
Dimensionality reduction is an important process in text categorization. Feature scoring methods are used in order to realize this reduction. Features are evaluated and selection is performed according to a certain threshold. In this paper, we propose combining pairs of high-performing feature scoring methods to enhance text categorization. We analyzed the performance of constructing this combining by using three operators; the union operator (UN), the union-cut operator (UC), along with the intersection operator (INT) in order to increase the confidence in the selected features. The results suggested that these combining operators, when applied on feature selection methods with comparable performance achieves an improvement. Generally, the UC operator demonstrated the best enhanced performance in classifying frequent categories whereas the UN operator was effective in the classification of rare categories. Additionally, the INT operator showed some potential in terms of storage reduction and performace improvement.
منابع مشابه
A Study of Local and Global Thresholding Techniques in Text Categorization
Feature Filtering is an approach that is widely used for dimensionality reduction in text categorization. In this approach feature scoring methods are used to evaluate features leading to selection. Thresholding is then applied to select the highest scoring features either locally or globally. In this paper, we investigate several local and global feature selection methods. The usage of Standar...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملOptimally Combining Positive and Negative Features for Text Categorization
This paper presents a novel local feature selection approach for text categorization. It constructs a feature set for each category by first selecting a set of terms highly indicative of membership as well as another set of terms highly indicative of non-membership, then unifying the two sets. The size ratio of the two sets was empirically chosen to obtain optimal performance. This is in contra...
متن کاملAn Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches
Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...
متن کاملA General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification in Telegram
Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006